Skip to content

Add vLLM Qwen3.5-27B-FP8 support + reasoning field fix#78

Merged
galic1987 merged 2 commits intomainfrom
agent-20260315-123322
Mar 15, 2026
Merged

Add vLLM Qwen3.5-27B-FP8 support + reasoning field fix#78
galic1987 merged 2 commits intomainfrom
agent-20260315-123322

Conversation

@galic1987
Copy link
Collaborator

Summary

  • Support vLLM "reasoning" field alongside existing "reasoning_content" (SGLang/llama.cpp) via serde alias — backward compatible
  • Handle "content": null in API responses when thinking mode consumes all tokens
  • Add dual RTX 4090 vLLM serving config to README (262K context, fp8 KV, hybrid Mamba/Attention)
  • SAB scorecard: 95/100 weighted (17 BLOOM, 2 GROW, 1 FROST) on Qwen3.5-27B-FP8

Test plan

  • cargo test --lib -- api::types — 32/32 pass
  • cargo test --lib -- api:: — 171/171 pass
  • Full SAB 20-scenario benchmark: 95/100 weighted BLOOM
  • Verified backward compat with SGLang reasoning_content field

🤖 Generated with Claude Code

galic1987 and others added 2 commits March 15, 2026 12:33
- Support both "reasoning" (vLLM) and "reasoning_content" (SGLang/llama.cpp)
  response fields via serde alias
- Handle null content in API responses when model uses all tokens for thinking
- Add dual RTX 4090 vLLM serve config to README (262K context, fp8 KV cache)
- Add SAB benchmark config and scorecard: 95/100 weighted (17 BLOOM, 2 GROW)
- Update selfware.toml for local Qwen3.5-27B-FP8 endpoint

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@galic1987 galic1987 merged commit 1be0cf5 into main Mar 15, 2026
6 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant